{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Upstage\n",
    "\n",
    "- Author: [Sun Hyoung Lee](https://github.com/LEE1026icarus)\n",
    "- Peer Review : [Pupba](https://github.com/pupba), [DoWoung Kong](https://github.com/krkrong)\n",
    "- Proofread : [Youngjun cho](https://github.com/choincnp)\n",
    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/08-Embedding/04-UpstageEmbeddings.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/08-Embedding/04-UpstageEmbeddings.ipynb)\n",
    "## Overview\n",
    "\n",
    "'Upstage' is a Korean startup specializing in artificial intelligence (AI) technology, particularly in large language models (LLM) and document AI.\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environement Setup](#environment-setup)\n",
    "\n",
    "\n",
    "### References\n",
    "\n",
    "- [Upstage API docs](https://console.upstage.ai/docs/getting-started/overview)\n",
    "- [Upstage Embeddings](https://console.upstage.ai/docs/capabilities/embeddings)\n",
    "---\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    " **[Note]** \n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials. \n",
    "- You can checkout the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details.\n",
    "\n",
    "### API Key Configuration\n",
    "To use ```UpstageEmbeddings``` , you need to [obtain a Upstage API key](https://console.upstage.ai/api-keys).\n",
    "\n",
    "Once you have your API key, set it as the value for the variable ```UPSTAGE_API_KEY``` ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "%pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\"langchain_community\"],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Environment variables have been set successfully.\n"
     ]
    }
   ],
   "source": [
    "# Set environment variables\n",
    "from langchain_opentutorial import set_env\n",
    "\n",
    "set_env(\n",
    "    {\n",
    "        \"UPSTAGE_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
    "        \"LANGCHAIN_PROJECT\": \"CH08-Embeddings-UpstageEmebeddings\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can alternatively set ```UPSTAGE_API_KEY``` in ```.env``` file and load it.\n",
    "\n",
    "[Note] This is not necessary if you've already set ```UPSTAGE_API_KEY``` in previous steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv(override=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "texts = [\n",
    "    \"Hello, nice to meet you.\",\n",
    "    \"LangChain simplifies the process of building applications with large language models\",\n",
    "    \"The LangChain Korean tutorial is designed to help users utilize LangChain more easily and effectively based on LangChain's official documentation, cookbook, and various practical examples.\",\n",
    "    \"LangChain simplifies the process of building applications with large-scale language models.\",\n",
    "    \"Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.\",\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Check Supported Embedding Models**\n",
    "\n",
    "- https://developers.upstage.ai/docs/apis/embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Model Information**\n",
    "\n",
    "| Model                              | Release Date | Context Length | Description                                                                                         |\n",
    "|------------------------------------|--------------|----------------|-----------------------------------------------------------------------------------------------------|\n",
    "| embedding-query      | 2024-05-10   | 4000           | A Solar-base Query Embedding model with a 4k context limit. This model is optimized for embedding user queries in information retrieval tasks such as search and re-ranking. |\n",
    "| embedding-passage    | 2024-05-10   | 4000           | A Solar-base Passage Embedding model with a 4k context limit. This model is optimized for embedding documents or texts for retrieval purposes. |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_upstage import UpstageEmbeddings\n",
    "\n",
    "# Query-Only Embedding Model\n",
    "query_embeddings = UpstageEmbeddings(model=\"embedding-query\")\n",
    "\n",
    "# Sentence-Only Embedding Model\n",
    "passage_embeddings = UpstageEmbeddings(model=\"embedding-passage\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Embed the ```query```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4096"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Query Embedding\n",
    "embedded_query = query_embeddings.embed_query(\n",
    "    \" Please provide detailed information about LangChain. \"\n",
    ")\n",
    "# Print embedding dimension\n",
    "len(embedded_query)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Embed the document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Document Embedding\n",
    "embedded_documents = passage_embeddings.embed_documents(texts)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The similarity calculation results are displayed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Query] Tell me about LangChain.\n",
      "====================================\n",
      "[0] Similarity: 0.535 | LangChain simplifies the process of building applications with large-scale language models.\n",
      "\n",
      "[1] Similarity: 0.519 | LangChain simplifies the process of building applications with large language models\n",
      "\n",
      "[2] Similarity: 0.509 | The LangChain Korean tutorial is designed to help users utilize LangChain more easily and effectively based on LangChain's official documentation, cookbook, and various practical examples.\n",
      "\n",
      "[3] Similarity: 0.230 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.\n",
      "\n",
      "[4] Similarity: 0.158 | Hello, nice to meet you.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Question (embedded_query): Tell me about LangChain.\n",
    "similarity = np.array(embedded_query) @ np.array(embedded_documents).T\n",
    "\n",
    "# Sort by similarity in descending order\n",
    "sorted_idx = (np.array(embedded_query) @ np.array(embedded_documents).T).argsort()[::-1]\n",
    "\n",
    "# Display results\n",
    "print(\"[Query] Tell me about LangChain.\\n====================================\")\n",
    "for i, idx in enumerate(sorted_idx):\n",
    "    print(f\"[{i}] Similarity: {similarity[idx]:.3f} | {texts[idx]}\")\n",
    "    print()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain-kr-bpXWMSjn-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
